import helper_files as hf
if (1): hf.show_banner()
Steller sea lions in the western Aleutian Islands have declined 94 percent in the last 30 years. The endangered western population, found in the North Pacific, are the focus of conservation efforts which require annual population counts. Currently, it takes biologists up to four months to count sea lions from the thousands of aerial photographs NOAA Fisheries collects each year. The objective of this competition is to develop an algorithm that automates the counting and classification of sea lions from over 18,000 aerial photographs (100 GB) taken each year.
This project is very similar to the Traffic Sign Classifier (CarND P2) and Vehicle Detection (CarND P5) projects. Instead of classifying traffic signs, we are classifying and counting different types of Sea Lions. The first part of the project detects the locations of sea lions from training images. These coordinates are then used to extract sample images of sea lions to train the deep learning models. The NVIDIA Deep Learning Model Architecture is then used as a proof of concept to verify the quality of the training data for classification (similar to the Traffic Sign Classifier).
This project is also similar to Vehicle Detection where Sliding Windows were used to create heatmaps for detecting and tracking vehicles. Sliding Windows could be used to detect, classify and count Sea Lions, but these images are very large and this approach is not feasible. Let's assume the average test image is (5000 x 4000) and the sliding windows are (100 x 100). This would create 2000 windows to analyze (without any overlap). If we generously say that the AWS GPU (g2.2xlarge) instances can process one Test Image per second, that would require 5 hours of GPU time for the entire test set!
The project code shows that test image quality is still very high even with 50% compression in each direction. This reduces file size for the test set from 100 GB to 30 GB, which will greatly speed analysis. Howeever, Sliding Windows is still a brute force approach and we definitely need to find a more efficient solution.
Darknet (YOLO v2) is an extension of the original YOLO (You Only Look Once) model architecture. This recent model (Dec 2016) has similar accuracy to SSD and Faster R-CNN, but at much faster fps. The Darknet model architecture should be the perfect deep learning network for counting and classifying different types of sea lions.
https://arxiv.org/pdf/1506.02640.pdf
https://arxiv.org/pdf/1612.08242.pdf
https://pjreddie.com/darknet/
http://guanghan.info/blog/en/my-works/train-yolo/
if (1): hf.show_yolo()
#from IPython.display import YouTubeVideo
#YouTubeVideo('VOC3huqHrss', width=975, height=425)
import cv2
import time
import glob
import pickle
import numpy as np
import pandas as pd
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
from sklearn.utils import shuffle
from keras.models import Sequential
from keras.layers import Dense, Flatten, Lambda, MaxPooling2D, Cropping2D, Dropout
from keras.layers.convolutional import Convolution2D
from keras.layers.advanced_activations import ELU
from keras.optimizers import Adam, Nadam
from sklearn.model_selection import train_test_split
from skimage.feature import blob_log
from sklearn.preprocessing import LabelBinarizer, OneHotEncoder, LabelEncoder
from keras.utils import np_utils
from sklearn.naive_bayes import GaussianNB
from IPython.display import YouTubeVideo
import helper_files as hf
%matplotlib inline
import sys
print(sys.version)
from tensorflow.python.client import device_lib
print (device_lib.list_local_devices()[1].physical_device_desc,device_lib.list_local_devices()[1].name)
import keras
print('Keras Version:',keras.__version__)
# loading list of mismatched training images
df_skip = pd.read_csv('./mismatched.csv')
# loading ground truth sea lion counts for training images
df = pd.read_csv('./input/train/train.csv')
df = df.drop('train_id',1)
# using colors to refer to sea lion categories
df.columns = ['red', 'magenta', 'brown', 'blue', 'green']
df.describe()
classes = hf.get_correl(df)
if (1): hf.plot_visuals(classes)
def get_blob_coord(train_img):
t = time.time()
print ('Getting Color Blobs...')
colors = ['red','magenta','brown','blue','green','error']
df_blob = pd.DataFrame(index=df.index, columns=colors)
count = 0
R, G, B, color = [], [], [], []
for fname in range(train_img[0],train_img[1]):
count += 1
if fname in list(df_skip['train_id']): continue
img_1 = mpimg.imread('./input/Train/' + str(fname) + '.jpg')
img_2 = mpimg.imread('./input/TrainDotted/' + str(fname) + '.jpg')
img_3 = cv2.absdiff(img_2,img_1)
mask_2 = cv2.cvtColor(img_2, cv2.COLOR_RGB2GRAY)
mask_2[mask_2 < 20] = 0
mask_2[mask_2 > 0] = 255
img_4 = cv2.bitwise_or(img_3, img_3, mask=mask_2)
img_5 = cv2.cvtColor(img_4, cv2.COLOR_RGB2GRAY)
blobs = blob_log(img_5, min_sigma=3, max_sigma=4, num_sigma=1, threshold=0.02)
red, magenta, green, blue, brown, error = [], [], [], [], [], []
#cut = np.copy(img_2)
for blob in blobs:
y, x, s = blob
r,g,b = img_2[int(y)][int(x)][:]
pred = predict_color(r,g,b)
if pred[0] == 'red':
red.append((int(y),int(x)))
color.append('red')
elif pred[0] == 'magenta':
magenta.append((int(y),int(x)))
color.append('magenta')
elif pred[0] == 'brown':
brown.append((int(y),int(x)))
color.append('brown')
elif pred[0] == 'blue':
blue.append((int(y),int(x)))
color.append('blue')
elif pred[0] == 'green':
green.append((int(y),int(x)))
color.append('green')
else:
error.append((int(y),int(x)))
color.append('error')
R.append(int(r))
G.append(int(g))
B.append(int(b))
#cv2.rectangle(cut, (int(x)-32,int(y)-32),(int(x)+32,int(y)+32), 0,-1)
if count % 25 == 0:
print ('File progress:',fname,len(red),len(magenta),len(brown),len(blue),len(green),len(error))
df_blob.loc[fname,'red'] = red
df_blob.loc[fname,'magenta'] = magenta
df_blob.loc[fname,'brown'] = brown
df_blob.loc[fname,'blue'] = blue
df_blob.loc[fname,'green'] = green
df_blob.loc[fname,'error'] = error
df_color = pd.DataFrame({'R':R, 'G':G, 'B':B,'color':color})
hf.check_colors(df,df_color,train_img)
print ('\nTotal Training Images:',train_img[1]-train_img[0])
print ('Time for getting Sea Lion coordinates:', np.round(time.time() - t, 4),'\n')
return R,G,B,color,df_color,df_blob
def predict_color(r,g,b):
if r > 200 and g < 50 and b < 50: pred = ('red',)
elif r > 200 and g < 50 and b > 200: pred = ('magenta',)
elif r < 100 and 150 < g < 200 and b < 100: pred = ('green',)
elif r < 100 and g < 100 and b > 100: pred = ('blue',)
elif r < 150 and g < 100 and b < 50: pred = ('brown',)
else: pred = ('error',)
return pred
if (1):
# getting locations of sea lions (color dots) in TrainDotted/*.jpg
# coordinates will be used to extract sea lion pics from Train/*.jpg
train_img = (41,51) # (0,947)
R,G,B,color,df_color,df_blob = get_blob_coord(train_img)
pickle.dump(df_color,open('df_color.p','wb'))
pickle.dump(df_blob,open('df_blob.p','wb'))
df_color = pickle.load(open('df_color.p','rb'))
df_blob = pickle.load(open('df_blob.p','rb'))
df_blob[43:47]
if (1): hf.plot_rgb_colors(df_color)
if (1): hf.plot_sample_images(df_blob)
def get_color_clf():
df_color = pickle.load(open('df_color.p','rb'))
X_train, y_train = hf.drive_learning_curve(df_color)
clf = GaussianNB()
clf.fit(X_train,y_train.flatten())
return clf
def predict_color(r,g,b):
pred = clf.predict([np.array([r,g,b])])
return pred
if (1):
# getting locations of sea lions (color dots) in TrainDotted/*.jpg
# coordinates will be used to extract sea lion pics from Train/*.jpg
# using more accurate GaussianNB over if-then for color values
clf = get_color_clf()
train_img = (41,51) # (0,947)
R,G,B,color,df_color,df_blob = get_blob_coord(train_img)
pickle.dump(df_color,open('df_color2.p','wb'))
pickle.dump(df_blob,open('df_blob2.p','wb'))
df_color = pickle.load(open('df_color2.p','rb'))
df_blob = pickle.load(open('df_blob2.p','rb'))
df_blob[43:47]
def resize_training_images():
t = time.time()
print ('Resizing: Test/*.jpg')
fnames = glob.glob('./input/Test.*.jpg')
count = 0
for fname in fnames:
count += 1
if count % 100 == 0: print ('Working Test Image:',count)
fname = fname[14:]
img_1 = mpimg.imread('./input/Test/'+fname)
# compress and resize images in multiples of 448 x 448 to feed Yolo
#x, y = int(img_1.shape[1]*0.5), int(img_1.shape[0]*0.5)
x, y = 6 * 448, 4 * 448
img_1 = cv2.resize(img_1, (x,y))
mpimg.imsave('./input_resize/Test/'+fname,img_1)
print ('\nTime for resizing Test Images:', np.round(time.time() - t, 4),'\n')
return None
if (1): hf.show_pic_resize()
if (0): resize_training_images()
def save_train_pics(train_img):
# compare numbers of colors found from blob detection with true values
def check_colors(fname):
color_true, color_found, color_diff = list(df.loc[fname]), [], []
for i in range(len(df_blob.loc[fname])-1):
color_found.append(len(df_blob.loc[fname][i]))
color_diff.append(color_true[i] - color_found[i])
return color_true, color_found, color_diff
t = time.time()
X_train, y_train = [], []
# option to save colors with different sizes
color_size = {'red':96,'magenta':96,'brown':96,'blue':96,'green':96}
colors = ['red','magenta','brown','blue','green']
# counter to index filenames
counter = 0
for fname in range(train_img[0],train_img[1]): # [0,947]
# skipping training images with color differences from true values
color_true, color_found, color_diff = check_colors(fname)
if color_true != color_found:
#print (fname, color_true, color_found, color_diff)
continue
img_1 = mpimg.imread('./input/Train/' + str(fname) + '.jpg')
img_2 = mpimg.imread('./input/TrainDotted/' + str(fname) + '.jpg')
for color in colors:
crop = int(color_size[color]/2)
for blob in df_blob.loc[fname][color]:
y, x = blob[0], blob[1]
range_y = range(crop, img_1.shape[0]-crop)
range_x = range(crop, img_1.shape[1]-crop)
if y in range_y and x in range_x:
img_train_1 = img_1[y-crop:y+crop,x-crop:x+crop,:]
img_train_2 = img_2[y-crop:y+crop,x-crop:x+crop,:]
counter += 1
#print (fname, counter, img_train_1.shape, y, x)
mpimg.imsave('./input/X_Train/'+ str(counter) + '.jpg', img_train_1)
mpimg.imsave('./input/X_TrainDotted/'+ str(counter) + '.jpg', img_train_2)
X_train.append(counter)
y_train.append(color)
# getting random negative images
rand_shift = 200
y = y + np.random.randint(-rand_shift,rand_shift)
x = x + np.random.randint(-rand_shift,rand_shift)
range_y = range(crop, img_1.shape[0]-crop)
range_x = range(crop, img_1.shape[1]-crop)
if y in range_y and x in range_x:
img_train_1 = img_1[y-crop:y+crop,x-crop:x+crop,:]
img_train_2 = img_2[y-crop:y+crop,x-crop:x+crop,:]
counter += 1
mpimg.imsave('./input/X_Train/'+ str(counter) + '.jpg', img_train_1)
mpimg.imsave('./input/X_TrainDotted/'+ str(counter) + '.jpg', img_train_2)
X_train.append(counter)
y_train.append('random')
#if counter == 150: break
df_img = pd.DataFrame()
df_img['fname'] = X_train
df_img['labels'] = y_train
print ('\nTime for saving training images:', np.round(time.time() - t, 4))
print ('Number of Training Image:',len(df_img))
return df_img
if (1):
train_img = (41,51) # (0,947)
df_img = save_train_pics(train_img)
pickle.dump(df_img,open('df_img.p','wb'))
df_img = pickle.load(open('df_img.p','rb'))
def get_samples(df_img, colors):
''' randomly sample training data ==> we will expand with data augmentation '''
num_colors = df_img['labels'].value_counts()
balance_color, color_list = [], []
print ('Numbers of samples in each Class:')
for color in colors:
print (color,num_colors[color])
balance_color.append(num_colors[color])
select = np.min(balance_color)
print ('\nBalancing dataset (selecting):')
for color in colors: print (color,select)
red = df_img[df_img['labels'] == 'red']
magenta = df_img[df_img['labels'] == 'magenta']
brown = df_img[df_img['labels'] == 'brown']
blue = df_img[df_img['labels'] == 'blue']
green = df_img[df_img['labels'] == 'green']
random = df_img[df_img['labels'] == 'random']
# randomly select samples from each class
if 'red' in colors:
red = red.sample(n = select)
color_list.append(red)
if 'magenta' in colors:
magenta = magenta.sample(n = select)
color_list.append(magenta)
if 'brown' in colors:
brown = brown.sample(n = select)
color_list.append(brown)
if 'blue' in colors:
blue = blue.sample(n = select)
color_list.append(blue)
if 'green' in colors:
green = green.sample(n = select)
color_list.append(green)
if 'random' in colors:
random = random.sample(n = select)
color_list.append(random)
df_train = pd.concat(color_list, ignore_index=True)
num_colors = df_train['labels'].value_counts()
return df_train
def get_train_data(pic_size, colors):
t = time.time()
df_train = get_samples(df_img, colors)
images, category = [], []
for i in range(1,len(df_train)):
fname = df_train.loc[i]['fname']
label = df_train.loc[i]['labels']
img_1 = mpimg.imread('./input/X_Train/' + str(fname) + '.jpg')
if np.shape(img_1) != (pic_size[0], pic_size[1], 3): img_1 = cv2.resize(img_1, pic_size)
category.append(label)
images.append(img_1)
augment = True
if augment:
img_2 = np.fliplr(img_1)
img_3 = np.flipud(img_1)
img_4 = np.flipud(img_2)
img_5 = np.rot90(img_1)
img_6 = np.fliplr(img_5)
img_7 = np.flipud(img_5)
img_8 = np.flipud(img_6)
for i in range(2,9): category.append(label)
images.append(img_2)
images.append(img_3)
images.append(img_4)
images.append(img_5)
images.append(img_6)
images.append(img_7)
images.append(img_8)
X_train = np.array(images)
y_train = np.array(category)
encoder = LabelEncoder()
encoder.fit(y_train)
encoded_Y = encoder.transform(y_train)
y_train = np_utils.to_categorical(encoded_Y)
print ()
print ('X_train shape:',X_train.shape)
print ('y_train shape:',y_train.shape)
print ('\nTime for loading training images:', np.round(time.time() - t, 4))
return X_train, y_train
if (1):
pic_size = (64,64)
colors = ['red','random']
X_train, y_train = get_train_data(pic_size, colors)
hf.show_augmented_data()
pic_size = (64,64)
colors = ['red','random']
X_train, y_train = get_train_data(pic_size, colors)
for i in range(5):
idx = np.random.randint(0,len(X_train))
img_1 = np.copy(X_train[idx])
HSV = cv2.cvtColor(img_1, cv2.COLOR_RGB2HSV)
YUV = cv2.cvtColor(img_1, cv2.COLOR_RGB2YUV)
GRY = cv2.cvtColor(img_1, cv2.COLOR_RGB2GRAY)
HLS = cv2.cvtColor(img_1, cv2.COLOR_RGB2HLS)
YCR = cv2.cvtColor(img_1, cv2.COLOR_RGB2YCrCb)
plt.figure(figsize=(18,6))
plt.subplot(1,6,1)
plt.imshow(img_1)
plt.subplot(1,6,2)
plt.imshow(HSV)
plt.subplot(1,6,3)
plt.imshow(YUV)
plt.subplot(1,6,4)
plt.imshow(GRY)
plt.subplot(1,6,5)
plt.imshow(HLS)
plt.subplot(1,6,6)
plt.imshow(YCR)
plt.show()
if (1): hf.show_nvidia_model()
def get_model(input_shape, classes):
model = Sequential()
#model.add(Lambda(lambda x: x / 127.5 - 1.0, input_shape=()) # (0,255) --> (-1,1)
model.add(Lambda(lambda x: (x / 255.0) - 0.5, input_shape=input_shape))
model.add(Convolution2D(24, (5, 5), padding='valid', strides=(2, 2), activation='elu'))
model.add(Convolution2D(36, (5, 5), padding='valid', strides=(2, 2), activation='elu'))
model.add(Convolution2D(48, (5, 5), padding='valid', strides=(2, 2), activation='elu'))
#model.add(Convolution2D(64, (3, 3), padding='valid', strides=(1, 1), activation='elu'))
#model.add(Convolution2D(64, (3, 3), padding='valid', strides=(1, 1), activation='elu'))
model.add(Flatten())
model.add(Dense(100, activation='elu'))
model.add(Dense(50, activation='elu'))
model.add(Dense(10, activation='elu'))
model.add(Dense(classes, activation='softmax'))
return model
def histogram_equalize(img):
r, g, b = cv2.split(img)
red = cv2.equalizeHist(r)
green = cv2.equalizeHist(g)
blue = cv2.equalizeHist(b)
return cv2.merge((red, green, blue))
def preprocess_data(X_train, y_train):
X = []
for i in range(len(X_train)):
### We can resize OR crop images OR both!! ###
img_1 = histogram_equalize(np.copy(X_train[i]))
img_1 = cv2.cvtColor(img_1, cv2.COLOR_RGB2HLS)
X.append(img_1)
X_train = np.array(X)
return X_train, y_train
def run_model(pic_size, colors, X_train, y_train):
t = time.time()
X_train, y_train = get_train_data(pic_size, colors)
X_train, y_train = preprocess_data(X_train, y_train)
model = get_model(input_shape = (pic_size[0], pic_size[1], 3), classes = len(y_train[0]))
t = time.time()
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['categorical_accuracy'])
history = model.fit(X_train, y_train, validation_split=0.20, shuffle=True, epochs=5, verbose=1)
hf.plot_loss(history)
print ('\nTime for Training Model:', np.round(time.time() - t, 4))
pic_size = (64,64)
colors = ['red','random']
run_model(pic_size, colors, X_train, y_train)
pic_size = (64,64)
colors = ['magenta','random']
run_model(pic_size, colors, X_train, y_train)
pic_size = (64,64)
colors = ['brown','random']
run_model(pic_size, colors, X_train, y_train)
pic_size = (64,64)
colors = ['blue','random']
run_model(pic_size, colors, X_train, y_train)
pic_size = (64,64)
colors = ['green','random']
run_model(pic_size, colors, X_train, y_train)
The Stratified Random Sampling section is currently written to only select the number of samples given by the smallest class being considered (which keeps the classes balanced). The data augmentation expands the dataset by 8X, so the SRS section should be rewritten to allow for inputing the desired number of training samples (which can have an upper limit of 8X the number of the smallest class).
One particularly interesting way to further augment the dataset would be with Random Cropping to offset training images after above transformations. Random Cropping towards the corners of the extracted image would create 4X more pics for training. Transformations (8X) and Random Cropping (4X) would expand the dataset by 32X.
There are 83,677 sea lions in the 1000 aerial photographs provided for training. The smallest class (magenta, subadult male) only has 4345 instances. Using full data augmentation (32X), this would allow for 139,040 training images in every class to be created and used for a balanced dataset. The present model has only been trained on sea lions from 10 aerial photographs.
Darknet-19 does about 75 fps (448 x 488) on Titan X. Resizing each Test Image (2688 x 1792) would reduce file size by about 70% and enable partitioning of each image into 24 x (448 x 448) tiles to feed the Darknet. 18,000 Test Images x 24 Tiles x 75 fps should take about 1.6 hours.